First part

data filter and normalization

source("./utils/tianfengRwrappers.R")
Loading required package: reticulate
Loading required package: tidyr

Attaching package: ‘tidyr’

The following object is masked from ‘package:S4Vectors’:

    expand


Attaching package: ‘MySeuratWrappers’

The following objects are masked from ‘package:Seurat’:

    DimPlot, DoHeatmap, LabelClusters, RidgePlot, VlnPlot


Attaching package: ‘cowplot’

The following object is masked from ‘package:ggpubr’:

    get_legend

Loading required package: viridisLite

Attaching package: ‘reshape2’

The following object is masked from ‘package:tidyr’:

    smiths

NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
      Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
      if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow

clusterProfiler v3.16.1  For help: https://guangchuangyu.github.io/software/clusterProfiler

If you use clusterProfiler in published research, please cite:
Guangchuang Yu, Li-Gen Wang, Yanyan Han, Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.

Attaching package: ‘clusterProfiler’

The following object is masked from ‘package:DelayedArray’:

    simplify

The following object is masked from ‘package:IRanges’:

    slice

The following object is masked from ‘package:S4Vectors’:

    rename

The following object is masked from ‘package:stats’:

    filter

Registering fonts with R

Attaching package: ‘plotly’

The following object is masked from ‘package:ggplot2’:

    last_plot

The following object is masked from ‘package:IRanges’:

    slice

The following object is masked from ‘package:S4Vectors’:

    rename

The following object is masked from ‘package:stats’:

    filter

The following object is masked from ‘package:graphics’:

    layout

Loading required package: e1071
Warning: couldn't connect to display ":0"
Attaching package: ‘widgetTools’

The following object is masked from ‘package:dplyr’:

    funs


Attaching package: ‘DynDoc’

The following object is masked from ‘package:DelayedArray’:

    path

The following object is masked from ‘package:BiocGenerics’:

    path


Attaching package: ‘IDPmisc’

The following object is masked from ‘package:shiny’:

    hr


Attaching package: ‘DT’

The following objects are masked from ‘package:shiny’:

    dataTableOutput, renderDataTable

The following object is masked from ‘package:SeuratObject’:

    JS

The following object is masked from ‘package:Seurat’:

    JS

========================================
circlize version 0.4.15
CRAN page: https://cran.r-project.org/package=circlize
Github page: https://github.com/jokergoo/circlize
Documentation: https://jokergoo.github.io/circlize_book/book/

If you use it in published research, please cite:
Gu, Z. circlize implements and enhances circular visualization
  in R. Bioinformatics 2014.

This message can be suppressed by:
  suppressPackageStartupMessages(library(circlize))
========================================

Loading required package: grid
========================================
ComplexHeatmap version 2.4.3
Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
Github page: https://github.com/jokergoo/ComplexHeatmap
Documentation: http://jokergoo.github.io/ComplexHeatmap-reference

If you use it in published research, please cite:
Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
  genomic data. Bioinformatics 2016.

This message can be suppressed by:
  suppressPackageStartupMessages(library(ComplexHeatmap))
========================================


Attaching package: ‘ComplexHeatmap’

The following object is masked from ‘package:plotly’:

    add_heatmap

Loading required package: AnnotationDbi

Attaching package: ‘AnnotationDbi’

The following object is masked from ‘package:plotly’:

    select

The following object is masked from ‘package:clusterProfiler’:

    select

The following object is masked from ‘package:dplyr’:

    select

From cellranger

QC

fly_merge <- CreateSeuratObject(Read10X("./raw_data_file/fly_merge/"), names.field = 2, names.delim = "-",
                                   project = "fly_merge", min.cells = 10, min.features = 200)

fly_merge <- PercentageFeatureSet(fly_merge, pattern = "^mt:", col.name = "percent.mt")

VlnPlot(fly_merge,c("percent.mt"))/
VlnPlot(fly_merge,c("nFeature_RNA"))/
VlnPlot(fly_merge,c("nCount_RNA"))


fly_merge <- subset(fly_merge, subset = nFeature_RNA > 300 & nFeature_RNA < 4000 & 
        nCount_RNA > 1000 &  nCount_RNA < 20000 & percent.mt < 10)
fly_merge <- fly_merge %>% SCTransform(vars.to.regress = "percent.mt", verbose = F) %>% 
  RunPCA() %>% FindNeighbors(dims = 1:20) %>% 
  RunUMAP(dims = 1:20) %>% 
  FindClusters(resolution = 0.1)
PC_ 1 
Positive:  regucalcin, CG31777, CG8501, CG5399, Gal, CG3940, CG14629, Gs1, Glt, CG4250 
       fat-spondin, Arpc3B, cer, lectin-28C, Hsp23, Ama, CG44250, Npc2a, CG31337, CG42566 
       Col4a1, scf, CG34296, fax, CG4950, Hexo2, SPARC, firl, CG10131, lectin-24Db 
Negative:  Phk-3, Gasp, ImpE2, ImpE1, Spn43Aa, CG15353, ImpL1, hui, CG2852, Spn100A 
       CG12009, Antp, obst-B, eEF1alpha1, RpL41, CG2016, serp, Fas2, CG9691, hth 
       CG2663, dsx-c73A, Fas3, pio, ogre, CG31997, CG11370, Cpr51A, Cyp1, betaTub56D 
PC_ 2 
Positive:  Gasp, Spn43Aa, CG12009, Cpr51A, Osi15, CG15353, CG11370, obst-B, CG7220, CG2016 
       CAH2, dpy, serp, l(2)34Fc, mt:lrRNA, Tsf1, peb, CG30154, sn, betaTub60D 
       CG8854, uif, Osi20, CG30197, form3, CG9782, obst-A, vvl, CG2852, lncRNA:CR44811 
Negative:  Phk-3, ImpL1, ImpE1, ImpE2, dsx-c73A, Antp, CG13737, Spn100A, CG34232, GstD1 
       hui, CG2663, CG16798, ct, CG32354, vir-1, CG9691, qtc, Ect3, Cpr47Eb 
       frm, Mur89F, Oaz, Cad88C, CG31637, tyn, slbo, lncRNA:CR43432, Cyp18a1, tnc 
PC_ 3 
Positive:  Gasp, CG2663, Spn100A, Cpr51A, Osi15, CG34232, CG13737, CG4702, CG16798, CG12009 
       dsx-c73A, ect, CG15353, Phk-3, qtc, CG32354, Ect3, CG2016, vir-1, Cad88C 
       Spn43Aa, CG7220, Antp, lncRNA:CR43314, Nep2, Cpr47Eb, Ldh, obst-A, CAH2, CG2852 
Negative:  Him, E(spl)m6-BFM, E(spl)m7-HLH, E(spl)mbeta-HLH, CG11835, E(spl)malpha-BFM, E(spl)m2-BFM, twi, E(spl)m8-HLH, hdc 
       hth, eEF1alpha2, E(spl)mgamma-HLH, ImpE2, vg, E(spl)m3-HLH, CG9650, Ppn, stg, E(spl)mdelta-HLH 
       side, Act57B, zfh1, Act87E, Pen, beat-IIIc, bib, smal, Hsp26, Sp1 
PC_ 4 
Positive:  CG34232, CG2663, CG13737, Him, E(spl)m6-BFM, qtc, ect, CG16798, Cad88C, CG32354 
       Osi15, Ect3, Cpr51A, CG4702, Gasp, dsx-c73A, slbo, Antp, CG10737, E(spl)m7-HLH 
       CG11835, Cpr47Eb, bol, SPARC, cpo, Oaz, Ldh, CG7724, twi, betaTub97EF 
Negative:  Phk-3, CG15353, Idgf4, CG9691, GstD1, ImpE2, Mmp1, Nep2, ImpE1, CG11370 
       Reg-5, l(2)34Fc, tnc, CG4928, ple, Cpr12A, frm, Spn43Aa, b, AANAT1 
       Cpr49Ac, ImpL1, CG9312, CG9192, Tsf1, Dl, Ance, Gs2, CG6770, Ddc 
PC_ 5 
Positive:  CG15353, CG11370, Spn100A, CG2663, CG34232, l(2)34Fc, Tsf1, Nep2, Spn43Aa, CG2016 
       CG13737, Gs2, lncRNA:CR43314, Ddc, l(2)k05911, Idgf4, qtc, CG13748, CG7888, Mmp1 
       CG13639, CG13082, bnl, obst-B, CG16798, CG3502, CG32354, Cad88C, SP1029, obst-A 
Negative:  Phk-3, ImpE2, Gasp, Cpr51A, ImpL1, Osi15, ImpE1, GstD1, CG7220, CG12009 
       Osi20, hui, Dbi, lncRNA:CR43432, lncRNA:CR44811, betaTub60D, CG9689, pio, bs, CG4702 
       CG17121, CG1368, CG15887, tsr, mirr, sn, frm, Mur89F, Rtnl1, chic 
Computing nearest neighbor graph
Computing SNN
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session10:33:01 UMAP embedding parameters a = 0.9922 b = 1.112
10:33:01 Read 18195 rows and found 20 numeric columns
10:33:01 Using Annoy for neighbor search, n_neighbors = 30
10:33:01 Building Annoy index with metric = cosine, n_trees = 50
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
10:33:04 Writing NN index file to temp file /tmp/RtmpLc7AED/file1099d32841635
10:33:04 Searching Annoy index using 1 thread, search_k = 3000
10:33:09 Annoy recall = 100%
10:33:10 Commencing smooth kNN distance calibration using 1 thread
10:33:12 Initializing from normalized Laplacian + noise
10:33:13 Commencing optimization for 200 epochs, with 769526 positive edges
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
10:33:21 Optimization finished
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 18195
Number of edges: 596302

Running Louvain algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.9703
Number of communities: 11
Elapsed time: 1 seconds
table(fly_merge$orig.ident)

    1     2     3 
10191   989  7015 
trachea <- subset(fly_merge, idents = c(0:3,6:8))
trachea <- trachea %>% 
  RunPCA() %>% FindNeighbors(dims = 1:20) %>% 
  RunUMAP(dims = 1:20)
DimPlot(trachea, cells.highlight = WhichCells(larva))
trachea <- trachea %>% FindClusters(resolution = 0.2)

trachea$ctrl_idents <- larva$seurat_clusters
fly_merge$ctrl_idents <- larva$seurat_clusters

ctrl <- subset(fly_merge_filtered, orig.ident == 1)
# saveRDS(ctrl,"./processed_data_file/ctrl.rds")

HSD <- subset(fly_merge_filtered, orig.ident == 2 | orig.ident == 3)
# saveRDS(HSD,"./processed_data_file/HSD.rds")

run harmony

library(harmony)
umapplot(trachea, split.by = "orig.ident")


trachea_harmony <- RunHarmony(trachea, "orig.ident", plot_convergence = T, project.dim = F, assay.use = "SCT")
trachea_harmony@reductions[["pca"]] <- trachea_harmony@reductions[["harmony"]]


trachea_harmony <- trachea_harmony %>% RunUMAP(dims = 1:20)
trachea_harmony <- FindClusters(trachea_harmony, resolution = 0.25)
umapplot(trachea_harmony)
umapplot(trachea_harmony, group.by = "ctrl_idents")
# saveRDS(fly_merge_harmony,"./processed_data_file/fly_merge_harmony.rds") #已经经过分组处理了
ctrl_filtered <- NormalizeData(ctrl_filtered, assay = "RNA") %>% ScaleData(assay = "RNA")
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Centering and scaling data matrix

  |                                                                                       
  |                                                                                 |   0%
  |                                                                                       
  |=======                                                                          |   8%
  |                                                                                       
  |==============                                                                   |  17%
  |                                                                                       
  |====================                                                             |  25%
  |                                                                                       
  |===========================                                                      |  33%
  |                                                                                       
  |==================================                                               |  42%
  |                                                                                       
  |========================================                                         |  50%
  |                                                                                       
  |===============================================                                  |  58%
  |                                                                                       
  |======================================================                           |  67%
  |                                                                                       
  |=============================================================                    |  75%
  |                                                                                       
  |====================================================================             |  83%
  |                                                                                       
  |==========================================================================       |  92%
  |                                                                                       
  |=================================================================================| 100%
HSD_filtered <- NormalizeData(HSD_filtered, assay = "RNA") %>% ScaleData(assay = "RNA")
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Centering and scaling data matrix

  |                                                                                       
  |                                                                                 |   0%
  |                                                                                       
  |=======                                                                          |   8%
  |                                                                                       
  |==============                                                                   |  17%
  |                                                                                       
  |====================                                                             |  25%
  |                                                                                       
  |===========================                                                      |  33%
  |                                                                                       
  |==================================                                               |  42%
  |                                                                                       
  |========================================                                         |  50%
  |                                                                                       
  |===============================================                                  |  58%
  |                                                                                       
  |======================================================                           |  67%
  |                                                                                       
  |=============================================================                    |  75%
  |                                                                                       
  |====================================================================             |  83%
  |                                                                                       
  |==========================================================================       |  92%
  |                                                                                       
  |=================================================================================| 100%

END

celltype annotation

LS0tCnRpdGxlOiAiRGF0YWxvYWQgYW5kIHByb2Nlc3MiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCiMgRmlyc3QgcGFydAojIyBkYXRhIGZpbHRlciBhbmQgbm9ybWFsaXphdGlvbgoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CmdldHdkKCkKa25pdHI6Om9wdHNfa25pdCRzZXQocm9vdC5kaXIgPSAiL2hvbWUvemp1L3RpYW5mZW5nL3NpbmdsZV9jZWxsX2F0bGFzIikKIyBzZXR3ZCgiL2hvbWUvemp1L3RpYW5mZW5nL3NpbmdsZV9jZWxsX2F0bGFzIikKIyBnZXR3ZCgpCmBgYAoKCmBgYHtyfQpzb3VyY2UoIi4vdXRpbHMvdGlhbmZlbmdSd3JhcHBlcnMuUiIpCmBgYAoKCiMgRnJvbSBjZWxscmFuZ2VyCiMjIFFDCmBgYHtyIGZpZy53aWR0aD00LCBmaWcuaGVpZ2h0PTEwfQpmbHlfbWVyZ2UgPC0gQ3JlYXRlU2V1cmF0T2JqZWN0KFJlYWQxMFgoIi4vcmF3X2RhdGFfZmlsZS9mbHlfbWVyZ2UvIiksIG5hbWVzLmZpZWxkID0gMiwgbmFtZXMuZGVsaW0gPSAiLSIsIHByb2plY3QgPSAiZmx5X21lcmdlIiwgbWluLmNlbGxzID0gMTAsIG1pbi5mZWF0dXJlcyA9IDIwMCkKCmZseV9tZXJnZSA8LSBQZXJjZW50YWdlRmVhdHVyZVNldChmbHlfbWVyZ2UsIHBhdHRlcm4gPSAiXm10OiIsIGNvbC5uYW1lID0gInBlcmNlbnQubXQiKQoKVmxuUGxvdChmbHlfbWVyZ2UsYygicGVyY2VudC5tdCIpKS8KVmxuUGxvdChmbHlfbWVyZ2UsYygibkZlYXR1cmVfUk5BIikpLwpWbG5QbG90KGZseV9tZXJnZSxjKCJuQ291bnRfUk5BIikpCgpmbHlfbWVyZ2UgPC0gc3Vic2V0KGZseV9tZXJnZSwgc3Vic2V0ID0gbkZlYXR1cmVfUk5BID4gMzAwICYgbkZlYXR1cmVfUk5BIDwgNDAwMCAmIAogICAgICAgIG5Db3VudF9STkEgPiAxMDAwICYgIG5Db3VudF9STkEgPCAyMDAwMCAmIHBlcmNlbnQubXQgPCAxMCkKYGBgCgpgYGB7ciBmaWcud2lkdGg9MTIsIGZpZy5oZWlnaHQ9Nn0KZmx5X21lcmdlIDwtIGZseV9tZXJnZSAlPiUgU0NUcmFuc2Zvcm0odmFycy50by5yZWdyZXNzID0gInBlcmNlbnQubXQiLCB2ZXJib3NlID0gRikgJT4lIAogIFJ1blBDQSgpICU+JSBGaW5kTmVpZ2hib3JzKGRpbXMgPSAxOjIwKSAlPiUgCiAgUnVuVU1BUChkaW1zID0gMToyMCkgJT4lIAogIEZpbmRDbHVzdGVycyhyZXNvbHV0aW9uID0gMC4xKQoKdGFibGUoZmx5X21lcmdlJG9yaWcuaWRlbnQpCgojIHNhdmVSRFMoZmx5X21lcmdlLCIuL3Byb2Nlc3NlZF9kYXRhX2ZpbGUvZmx5X21lcmdlLnJkcyIpCmBgYAoKYGBge3J9CnRyYWNoZWEgPC0gc3Vic2V0KGZseV9tZXJnZSwgaWRlbnRzID0gYygwOjMsNjo4KSkKdHJhY2hlYSA8LSB0cmFjaGVhICU+JSAKICBSdW5QQ0EoKSAlPiUgRmluZE5laWdoYm9ycyhkaW1zID0gMToyMCkgJT4lIAogIFJ1blVNQVAoZGltcyA9IDE6MjApCkRpbVBsb3QodHJhY2hlYSwgY2VsbHMuaGlnaGxpZ2h0ID0gV2hpY2hDZWxscyhsYXJ2YSkpCnRyYWNoZWEgPC0gdHJhY2hlYSAlPiUgRmluZENsdXN0ZXJzKHJlc29sdXRpb24gPSAwLjIpCgp0cmFjaGVhJGN0cmxfaWRlbnRzIDwtIGxhcnZhJHNldXJhdF9jbHVzdGVycwpmbHlfbWVyZ2UkY3RybF9pZGVudHMgPC0gbGFydmEkc2V1cmF0X2NsdXN0ZXJzCgpjdHJsIDwtIHN1YnNldChmbHlfbWVyZ2VfZmlsdGVyZWQsIG9yaWcuaWRlbnQgPT0gMSkKIyBzYXZlUkRTKGN0cmwsIi4vcHJvY2Vzc2VkX2RhdGFfZmlsZS9jdHJsLnJkcyIpCgpIU0QgPC0gc3Vic2V0KGZseV9tZXJnZV9maWx0ZXJlZCwgb3JpZy5pZGVudCA9PSAyIHwgb3JpZy5pZGVudCA9PSAzKQojIHNhdmVSRFMoSFNELCIuL3Byb2Nlc3NlZF9kYXRhX2ZpbGUvSFNELnJkcyIpCgpgYGAKCgojIHJ1biBoYXJtb255CmBgYHtyfQpsaWJyYXJ5KGhhcm1vbnkpCnVtYXBwbG90KHRyYWNoZWEsIHNwbGl0LmJ5ID0gIm9yaWcuaWRlbnQiKQoKCnRyYWNoZWFfaGFybW9ueSA8LSBSdW5IYXJtb255KHRyYWNoZWEsICJvcmlnLmlkZW50IiwgcGxvdF9jb252ZXJnZW5jZSA9IFQsIHByb2plY3QuZGltID0gRiwgYXNzYXkudXNlID0gIlNDVCIpCnRyYWNoZWFfaGFybW9ueUByZWR1Y3Rpb25zW1sicGNhIl1dIDwtIHRyYWNoZWFfaGFybW9ueUByZWR1Y3Rpb25zW1siaGFybW9ueSJdXQoKCnRyYWNoZWFfaGFybW9ueSA8LSB0cmFjaGVhX2hhcm1vbnkgJT4lIFJ1blVNQVAoZGltcyA9IDE6MjApCnRyYWNoZWFfaGFybW9ueSA8LSBGaW5kQ2x1c3RlcnModHJhY2hlYV9oYXJtb255LCByZXNvbHV0aW9uID0gMC4yNSkKdW1hcHBsb3QodHJhY2hlYV9oYXJtb255KQp1bWFwcGxvdCh0cmFjaGVhX2hhcm1vbnksIGdyb3VwLmJ5ID0gImN0cmxfaWRlbnRzIikKIyBzYXZlUkRTKGZseV9tZXJnZV9oYXJtb255LCIuL3Byb2Nlc3NlZF9kYXRhX2ZpbGUvZmx5X21lcmdlX2hhcm1vbnkucmRzIikgI+W3sue7j+e7j+i/h+WIhue7hOWkhOeQhuS6hgpgYGAKCmBgYHtyfQpjdHJsX2ZpbHRlcmVkIDwtIE5vcm1hbGl6ZURhdGEoY3RybF9maWx0ZXJlZCwgYXNzYXkgPSAiUk5BIikgJT4lIFNjYWxlRGF0YShhc3NheSA9ICJSTkEiKQpIU0RfZmlsdGVyZWQgPC0gTm9ybWFsaXplRGF0YShIU0RfZmlsdGVyZWQsIGFzc2F5ID0gIlJOQSIpICU+JSBTY2FsZURhdGEoYXNzYXkgPSAiUk5BIikKYGBgCgojIyBFTkQgIyMKY2VsbHR5cGUgYW5ub3RhdGlvbgo=